Apache Hive Reviews & Ratings 2024

Overview

What is Apache Hive?

Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.

Recent Reviews

TrustRadius Insights

December 15, 2023

Apache Hive is a versatile software that has been widely used across various departments and organizations for different use cases. It has …

With Apache Hive, you can enter the world of Big Data

8 out of 10

July 06, 2022

On-premises large data processing is handled by Apache Hive, which is running on Cloud ERA Servers. In order to use Apache Hive, you must …

Best Distributed Database in the market

6 out of 10

April 19, 2022

Incentivized

We use Apache Hive to store a large set of data, which are huge documents such as problem statements and its answer, not only submitted by …

Help your dev team !

8 out of 10

April 12, 2022

Incentivized

We build our data lake and perform queries on large amounts of data. We group data from multiple sources into a common structure, making …

Spectacular SQL-like interface for accessing Hadoop

9 out of 10

April 11, 2022

Incentivized

To manage and view Apache Hadoop data in a SQL-like format To be able to query databases across the organization, quickly To query data …

This system makes active data of value.

8 out of 10

April 09, 2022

Incentivized

We have used the system to migrate data either for new versions or because we will use another operating program, the software helps us to …

Best query platform for ETL.

6 out of 10

April 08, 2022

Incentivized

I used Apache Hive on top of Hadoop for filtering and cleaning data using SQL. It was the part of the project which I was working on. …

It is an advance to the ease of the processes

8 out of 10

April 08, 2022

Incentivized

The software is intuitive from the first steps, one of the first features we take into account for the software does not allow duplicate …

Capabilities of Apache Hive

8 out of 10

April 07, 2022

Incentivized

Main purpose for using Apache Hive was to get the insights from data. Analyzing the data and use it to take informed business decisions. …

Excellent bigdata warehouse solution

9 out of 10

April 07, 2022

Incentivized

Apache Hive is an open-source data warehouse solution built on top of Hadoop that helps to analyze a very large amount of data.
Our use …

very useful for OLTP

10 out of 10

April 06, 2022

Incentivized

We use Apache to process large data and get the output with less process time. The framework is very much useful for data processing and …

Apache Hive

9 out of 10

November 24, 2021

Incentivized

1. Used Apache Hive to create external and internal tables in Hadoop / BigData projects on Cloudera and Azure platforms. 2. Apache Hive …

Walk into the World of Big Data with Apache Hive

9 out of 10

June 02, 2021

Incentivized

We are using Apache Hive over an on-premise big data setup built on top of Cloud ERA Servers. Use case behind using Apache Hive [it] is …

Reliable and Cheaper one stop Data warehouse solution

9 out of 10

December 28, 2020

Incentivized

I have used Apache Hive in [the] last 3 companies and it's being used by the multiple departments spread across data analytics, …

Big Data the SQL way

8 out of 10

September 23, 2020

Incentivized

I am working as a Research Assistant where I have to process tons of data to produce appropriate findings. Our NLP lab used it for all its …

Apache Hive: Big data querying tool w/SQL interface, but slower, more costly computation

7 out of 10

September 21, 2020

Incentivized

We use Apache Hive to make data-driven decisions. It is used from finance to engineering to sales. It helps aggregate our massive data …

Read all reviews

Awards

Products that are considered exceptional by their customers based on a variety of criteria win TrustRadius awards. Learn more about the types of TrustRadius awards to make the best purchase decision. More about TrustRadius Awards

Return to navigation

Pricing

View all pricing

Apache Hive

N/A

Unavailable

What is Apache Hive?

Entry-level set up fee?

No setup fee

Offerings

Free Trial
Free/Freemium Version
Premium Consulting/Integration Services

Would you like us to let the vendor know that you want pricing?

24 people also want pricing

Alternatives Pricing

ClicData

$79

per month

What is ClicData?

ClicData is a 100% cloud-based business intelligence platform that allows users to connect, process, blend, visualize and share data from a single place. As an automated platform, users are able to rely on the latest version of company data, to ensure users make the right decisions. Hundreds of…

retailMetrix

$399

per month per installation

What is retailMetrix?

RetailMetrix is a data analytics platform for retailers with the mission of enabling retailers to get value from their data. RetailMatrix processes and stores sales, labor and customer data using data warehouse technologies. Its dashboards and reports allows team to find the data that matters to…

Return to navigation

Product Demos

Apache Hive Hadoop Ecosystem - Big Data Analytics Tutorial by Mahesh Huddar

YouTube

Connecting Microsoft Power BI to Apache Hive using Simba Hive ODBC driver

YouTube

Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

YouTube

Return to navigation

Product Details

About
Tech Details
FAQs

What is Apache Hive?

Apache Hive Technical Details

Operating Systems	Unspecified
Mobile Application	No

Frequently Asked Questions

Reviewers rate Usability highest, with a score of 8.5.

The most common users of Apache Hive are from Mid-sized Companies (51-1,000 employees).

Return to navigation

Comparisons

View all alternatives

Compare with

Reviews and Ratings

(97)

December 15th 2023

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Business Problems Solved

Apache Hive is a versatile software that has been widely used across various departments and organizations for different use cases. It has proven to be particularly helpful in handling large datasets, migrating data between different operating systems, synchronizing programs, and fetching and generating product metrics. Users have found value in using Hive for data analytics, engineering, data science, product management, and IT-related tasks such as improving analysis of big datasets stored in Hadoop HDFS.

Furthermore, Apache Hive has simplified the process of filtering and cleaning data using SQL, reducing the learning curve for handling big data. It allows users to run SQL queries against data in Hadoop, enabling efficient analysis of large datasets without the need to learn a new language. Additionally, Hive has been utilized for building reports, analyzing data stored in the Hadoop file system, processing events gathered in HDFS, and converting them into parquet files for fast querying.

Overall, users have praised Apache Hive for its scalability, accessibility, and cost-effectiveness in storing and retrieving analytics data. It has provided an intuitive solution for storing large datasets, querying big sets of data using SQL, aggregating massive datasets into distilled information for data-driven decision making, and creating external and internal tables in Hadoop/BigData projects. With its ability to process both unstructured and structured data efficiently, Hive has become an essential tool for data analysts, engineers, and business analysts across organizations.

Attribute Ratings

Reviews

(1-25 of 30)

Sort By *

Companies can't remove reviews or game the system. Here's why

July 06, 2022

With Apache Hive, you can enter the world of Big Data

Verified User

Engineer in Engineering

Computer Software Company, 201-500 employees

Score 8 out of 10

Vetted Review

Verified User

Use Cases and Deployment Scope

On-premises large data processing is handled by Apache Hive, which is running on Cloud ERA Servers. In order to use Apache Hive, you must have a distributed system that is query efficient and can perform queries quicker with parallel execution. Metrics like user information and purchase history are stored in HDFS and then accessed using queries built on top of Hive using Apache Hive.

Pros and Cons

Reduce-based query language with a simple query language.
Parallelism across a distributed system is provided.
All cloud platforms have access to a tabular format and interfaces.

Due to the shuffled data, complex joins may take a long time to complete.
Execution is dependent on external storage and memory.

Likelihood to Recommend

Data warehouses that update and append records in batches or real time can be queried using Apache Hive. Tableau and other reporting tools may be used straight from Python searches on Apache data sets. Structured data and tables may be accessed using SQL-like syntax. Using a hive, you may build tables at various levels of the Data Lake. Transactional databases are not the best fit.

April 19, 2022

Best Distributed Database in the market

Prasanna Kumar TR

Developer and Site Contributor

ForgetCode.com (Computer Software, 1-10 employees)

Score 6 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We use Apache Hive to store a large set of data, which are huge documents such as problem statements and its answer, not only submitted by the site owners but also by the user of the site.

Pros and Cons

It is easy to store the data that are unstructured
Easy to retrieve using SQL queries instead of other complicated way
Large set of data can be stored efficiently

Apache Hive can provide more flexibility on the Integration.

Likelihood to Recommend

Apache Hive wont is really useful when we just store small data sets. so sometimes our usage wont is suitable for Hive. we are planning to move to SQL Databases if it continues.

April 12, 2022

Help your dev team !

Verified User

C-Level Executive in Information Technology

Computer Software Company, 11-50 employees

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We build our data lake and perform queries on large amounts of data. We group data from multiple sources into a common structure, making it easy for our developers to perform complex queries without leaving the simple framework provided by SQL. Although the deployment is not easy, once we have the infrastructure, the work is greatly simplified.

Pros and Cons

Simplify query to devs
Organize data
Batch process

Deploy
Maintenance
Support

Likelihood to Recommend

It is great for laboratory environments and to start working with unstructured data about which we are not very clear about how we want to treat it. It also allows queries to be improved very quickly by allowing developers to work with SQL instead of map-reduce. As an improvement, in productive environments, troubleshooting is complicated and requires expert personnel.

April 11, 2022

Spectacular SQL-like interface for accessing Hadoop

Verified User

Engineer in Engineering

Telecommunications Company, 10,001+ employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

To manage and view Apache Hadoop data in a SQL-like format To be able to query databases across the organization, quickly To query data for the purpose of using on Spark projects To save queries

Pros and Cons

Easy-to-use, interactive modern layout
Easy to organize data and view tables and views from across the organization
Fast speed for most queries

Some queries, particularly complex joins, are still quite slow and can take hours
Previous jobs and queries are not stored sometimes
Switching to Impala can sometimes be time-consuming (i.e. the system hangs, or is slow to respond).
Sometimes, directories and tables don't load properly which causes confusion

Likelihood to Recommend

Apache Hive is well-suited for querying Hadoop. If you use Hadoop you should consider Hive. It is well-suited for large organizations where there is lots of data that needs to be queried. However, there is significant overhead to set up and maintaining Hive (and Hadoop in general). Small companies and individuals should consider other means of storing data, such as SQL.

April 08, 2022

Best query platform for ETL.

Omkar Marne

Research Application Software Engineer

University of North Carolina at Charlotte (Higher Education, 1001-5000 employees)

Score 6 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

I used Apache Hive on top of Hadoop for filtering and cleaning data using SQL. It was the part of the project which I was working on. Apache Hive gives SQL-like a platform where we can fire SQL queries. Apache Hive was a perfect choice for cleaning data as we were using Apache Hadoop and both are Apache products.

Pros and Cons

Filtering data
cleaning data
SQL like interface
Integrates with Hadoop

Uses lot of lot of memory
Not compatible with other databases like postgres, MySql
Limited support
Slow as compare o other interfaces

Likelihood to Recommend

Apache Hive is best for ETL ( Extract Transform Load ) purposes. It gives its best performance when integrated with the Hadoop file distributed system. Its also very good for performing mathematical operations and when the data is organized and structured. It can handle large sizes of data ( petabytes) but requires a lot of in-memory in the system. It supports both unstructured and structured data nut best with structured data.

April 08, 2022

It is an advance to the ease of the processes

Pablo Gonzalez

Internet Marketing Manager

MKTi México (Marketing & Advertising, 51-200 employees)

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

The software is intuitive from the first steps, one of the first features we take into account for the software does not allow duplicate files to be stored. It is advanced software that through data the system constantly learns and develops. The first phase is very effective, the analysis and checking of the information are verified in detail.

Pros and Cons

The unification of the data will help to establish the commercial criteria.
We are sure that the data is protected

If you try to extract an excessive amount of data, the system will become slow
You may have the danger that the system collapses due to the amount of data

Likelihood to Recommend

In addition to the fact that the information is quickly accessible through the established security protocols, it has not helped us as users to maintain a fairly comfortable data processing flow, it is more profitable to process the data in batches, we have been able to unify data from different sources

April 07, 2022

Capabilities of Apache Hive

Verified User

C-Level Executive in Product Management

Entertainment Company, 51-200 employees

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Main purpose for using Apache Hive was to get the insights from data. Analyzing the data and use it to take informed business decisions. Also the interface is similar to SQL working so it is easy to understand for a new person also.

Pros and Cons

It can be used to retrieve data from database like SQL.
We can partition the data and distribute amongst the clustered machines
Easily scalable, which gives capability of running analytics at a larger level

No support for working with Unstructured data.
ACID properties are not followed like database which creates confusion many times
Support OLAP environment only, OLTP is not supported

Likelihood to Recommend

If you have workforce who are knowing SQL and you have a need to explore large-scale data and get insights from it then Apache Hive is perfect for you. If you have experienced people who have worked on big data earlier then using Splunk is better. For starting the journey in data-driven decisions and data analytics it is better to use Apache Hive first.

April 07, 2022

Excellent bigdata warehouse solution

Verified User

Program Manager in Information Technology

Information Technology & Services Company, 201-500 employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Apache Hive is an open-source data warehouse solution built on top of Hadoop that helps to analyze a very large amount of data.
Our use case/scope is to work on a large data analytics project where the data frequency and velocity are very high. Apache Hive is very useful in processing both the unstructured and structured data in a seamless way. It help us in reducing to write complex queries as it is targeted to the SQL queries, we have a engineer team who are very proficient in writing SQL queries with the help of Apache Hive to process the big data.
We have identified no business issues using the solution.

Pros and Cons

Apache Hive supports external data tables.
Supports data partitioning to improve overall performance.
Apache hive is reliable and scalable solution.
Apache Hive supports writing ad-hoc queries as well.

Apache hive is not best suited for OLTP based jobs.
Sometimes we observed high latency rate while querying data.
Limitations on providing row-level data update.
Training materials needs improvements.

Likelihood to Recommend

Apache Hive is a data warehouse/ ETL solution that is being used for processing big data for analytics and visualizations. Apache Hive has great architecture that makes it very well suited for organizations.
The Metastore, is used for storing metadata for each table and its schema. The Driver operates as a controller for executions of the statements. Like other components such as Optimizer and CLI, Thrift Server are some components that enable the processing of big data transformation.

April 06, 2022

very useful for OLTP

Verified User

Administrator in Information Technology

Health, Wellness and Fitness Company, 1001-5000 employees

Score 10 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We use Apache to process large data and get the output with less process time. The framework is very much useful for data processing and analytics purpose.

Pros and Cons

Used in data warehouse like similar to ETL tools.
Interface like SQL give data stored in various db group.
Enables analytics at massive scale.

Way of framework development can be improved.
OLTP is not supported.
Does not offer real time queries.

Likelihood to Recommend

Keeps queries running very fast and takes very little time to write Hive queries in comparison to MapReduce code. Very easy to write queries including joins in Hive.

November 24, 2021

Apache Hive

Surendranatha Reddy Chappidi

Senior Data Engineer

Maersk (Logistics & Supply Chain, 10,001+ employees)

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

1. Used Apache Hive to create external and internal tables in Hadoop / BigData projects on Cloudera and Azure platforms. 2. Apache Hive supports different file formats to create tables. Supported file formats are CSV, Parquet, Avro, JSON. 3. Apache Hive can store billions of records in distributed storage and retrieve them efficiently. 4. Apache hive used spark/ Tez / MapReduce engines in the backend for computation.

Pros and Cons

Apache Hive is fault-tolerant.
Apache Hive's latest version supports ACID transactions.
Apache Hive supports UPDATE, DELETE and MERGE.

Apache Hive should support ROLLBACK, COMMIT operations.
Apache Hive should support XML SerDe.
Apache Hive.

Likelihood to Recommend

Well suited for: For accessing the structured data and tables using SQL-like syntax. A hive is a good option for creating tables in different layers of Data Lake. Not well suited for Transactional databases.

June 02, 2021

Walk into the World of Big Data with Apache Hive

akshay kashyap

CONSULTANT

Deloitte Digital (Consumer Goods, 5001-10,000 employees)

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We are using Apache Hive over an on-premise big data setup built on top of Cloud ERA Servers. Use case behind using Apache Hive [it] is query efficient over distributed system and runs queries faster, with parallel execution. We save our metrics such as user info, purchase history, transaction and preferences in HDFS file system and use Apache Hive to query on top of it and run analytics to display output.

Pros and Cons

Simple query language built on top of Ma reduce paradigm.
Provides parallel execution over distributed system.
Tabular format and connectors available for all cloud platforms.

Complex joins may take time to execute due to shuffling of data.
Static queries mostly.
Slower than Apache Spark by almost 100 times.
Dependent on external memory and storage to execute.

Likelihood to Recommend

You can use Apache Hive to query over a large data warehouse which updates, append records on either batch or in real time. Apache queries can give you output in the desired format that you can use as any reporting tool such as Tableau, directly using Python.

September 23, 2020

Big Data the SQL way

Verified User

Engineer in Research & Development

Information Technology & Services Company, 11-50 employees

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

I am working as a Research Assistant where I have to process tons of data to produce appropriate findings. Our NLP lab used it for all its big data processing, for example: removing urls, finding counts of specific words, etc. Mainly it assisted in all the processing, cleaning on big datasets we collected for our research.

Pros and Cons

The SQL-like query language is very familiar to all the CS students. Hence, it's easy to use.
I used it on a server so I realize it is very scalable and can be used to process small and big datasets.
I particularly liked the UDF functionality where the user could define functions to produce particular output.

Transactions are not supported
Lack of subqueries made some tasks achievable only when completing one query and then the subsequent one
It is not as fast as spark.

Likelihood to Recommend

Apache Hive is very well suited for those who are very familiar to SQL query syntax. Due to its easy to use syntax, it can really help in scenarios where a conventional database cannot be used for analysis of big datasets.

On the other hand, it's definitely slower than some other alternatives such as spark. Also, it's not recommended to use it in processing small datasets. Pandas and other normal data loading libraries can be useful to deal with small datasets.

September 21, 2020

Apache Hive: Big data querying tool w/SQL interface, but slower, more costly computation

Kristjan Gannon

Senior Software Engineer

Sovrn Holdings, Inc. (Internet, 51-200 employees)

Score 7 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We use Apache Hive to make data-driven decisions. It is used from finance to engineering to sales. It helps aggregate our massive data sets into distilled information.

Pros and Cons

Flexibility through schema on read
Familiar SQL like query language
Functions for complex queries and analysis

Slower processing than other tools on the market

Likelihood to Recommend

Apache Hive is useful for regularly reporting and analyzing data. In terms of ad-hoc analysis and debugging, the cycles can be quite long for querying, feedback, debugging queries, etc.

September 20, 2020

Hive: When SQL marries with Hadoop

Verified User

Strategist in Information Technology

Package/Freight Delivery Company, 10,001+ employees

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Hive plays a vital role in our company, together with Hadoop storage. It makes the query and aggregation much easier for old DBA background data analyst, while still benefiting a lot from the performance boost brought by Hadoop. It makes big data analysis more feasible and close to the daily business context.

Pros and Cons

The SQL, like query interface, is the core value and shining core of the Hive.
It supports various data formats stored and also allows indexing.
It is fast.

No transaction support.
No sub-query support.
Can only deal with the cold data (non-real time).

Likelihood to Recommend

Hive is suitable for big data analysis tasks on top of the historical data storage but is not quite suitable for any real-time data (if that is the case, Casandra should be considered). And as it is not real SQL, for a read-only operation and in-fly aggregation, it is very good, however, if data modification and transaction are needed, it is not suitable.

September 19, 2020

Manage data for your warehouse as strong as a beehive using Apache HIve!

Ananth Gouri

Assistant Professor

The National Institute of Engineering, Mysuru (Education Management, 501-1000 employees)

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

As we all know that, Apache Hive sits on the top of Apache Hadoop and is basically used for data-related tasks - majorly at the higher abstraction level. I work as an Assitant Professor at NIE, Mysuru and I am a user of Apache Hive since the first time I taught Big Data Analytics as a PG Course to my students.
It was one of those technical sessions and I was supposed to demonstrate a word count program of a novel downloaded from the Project Gutenberg. I was successfully able to download the novel, load it into the Hadoop platform and execute a HiveQL (a SQL similar syntax used by Apache Hive) query to demonstrate for few unique words, their count, and related examples.

Pros and Cons

The capability to handle large amounts of data and its querying process.
A syntax similar to SQL is an added advantage.
An active developer support and community always ready to help.
Ease of usage.

Resource consuming sometimes. May be that I was using a larger object file.
Needs to add an update or a modify functionality. This has to be the minimilastic CRUD requirement.

Likelihood to Recommend

I would definitely recommend Apache Hive if sought by a colleague. Especially for people who are working at academic institutions, they can demonstrate programs like word count, tab count, space count, new lines count, and other related programs - with a basic setup of a HiveQL.

The only underlying problem could be that the Apache Hive is designed to run on the Apache Hadoop ecosystem. People who are not comfortable using a Linux tree structure based File System or even people who are not likely to use a Linux OS might not like to use Hive.

September 18, 2020

Reliable, cheap and trustworthy!

Nicolas Hubert

Machine Learning Engineer

Credit Suisse (Banking, 10,001+ employees)

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

It is only used in the IT department, mainly by IT engineers, Data Scientists, and Business Analysts with a technical background. It requires some time to master this tool, so this is only for engineer-related positions.

Pros and Cons

Reading databases
Writing databases
Storing databases
Distributed databases

Improvement techniques for handling Relational Data
Advanced optimizations
Transactions memory

Likelihood to Recommend

Apache Hive acts as a hub for information to be stored and smoothly readable + analyzed by BI analysts in order to make wise and data-driven decisions. Users can read, write and manage data, too. This only requires some SQL intermediary knowledge, and we all know learning SQL is quite easy. I do not think of any scenario where Apache Hive would not be appropriate.

September 18, 2020

Apache Hive: SQL, open-source querying tool

Verified User

Analyst in Professional Services

Legal Services Company, 201-500 employees

Score 7 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Our company primarily uses Apache Hive to manage our data warehouse by being able to query multiple databases. We partition our tables as well as monitor query performance on very custom data queries by using this hive. Hive is only used by our data analysts and an overseas data warehouse team with only a few shared licenses existing on our virtual machines.

Pros and Cons

Monitor query performance
Manage tables in the data warehouse
Uses standard SQL

UI is quite dated and not intuitive
Open-source, so does not have consistent updates or support
Not the most optimal for ETL processes

Likelihood to Recommend

Apache Hive is well suited for organizations looking for an initial tool to begin their process of managing their data warehouse as it is open-source and relatively easy to set up. This works well with some legacy systems and many consoles support this. While Hive used to be quite revolutionary, it has fallen behind many other tools that are more performant or specialized for managing DBs, writing queries, and partitioning tables.

August 29, 2018

My Apache Hive Review

Kartik Chavan

Data Science Trainee

DivIHN Integration Inc (Electrical/Electronic Manufacturing, 1001-5000 employees)

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Apache Hive is being used in our company mainly for big data analysis. It has greatly helps us with data processing & analysis. It is being used across the whole organization. The business problem addressed by it is that it has been helping our organization in storing large data sets and easily accessing them.

Pros and Cons

Querying in Apache Hive is very simple because it is very similar to SQL.
Hive produces good ad hoc queries required for data analysis.
Another advantage of Hive is that it is scalable.

Apache Hive isn't designed for and doesn't support online processing of data.
Sub queries not supported.
Updating the data can be a problematic task.

Likelihood to Recommend

It is perfectly suited for analytics.

June 07, 2018

Hive is solid data analytical tool

Verified User

Engineer in Engineering

Information Technology and Services Company, 201-500 employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Hive is currently used in our Data Warehouse in our company. It helps us give more structure to our data and as Hive sits on top of Hadoop, the MR engine. It is a big plus when you want to run a complex query and get faster results. This helps us facilitate the Business Intelligence team to use Hive as a self-querying tool.

Pros and Cons

It's Fast!
You can store a different kind of data structures here other than the standard ones
Good scalability
Good redundancy too

It's not as ACID compliant as an RDBMS. It's a recently added feature and still needs work.
This is not the tool to go for online data processing.
It does not support sub-queries.
It can't process data in real time.

Likelihood to Recommend

This is best suited for data analysts and scientists, it's not a programmers tool. You may still need an RDBMS to read data from as updates and deletes can get a bit more complicated, you can run batch jobs, this will have to be facilitated by additional tools.
Its good for fast query processing, for storing large amounts of data.

March 01, 2018

Hive - SQL-like query engine for big data platform

Verified User

Analyst in Corporate

Telecommunications Company, 10,001+ employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Hive is not used across whole organization but used by certain teams which require querying data from our big data store infrastructure like HDFS. It provides an interface to interact with and directly query HDFS, similar to the way we do it with any relational databases. It is a powerful tool for querying big data.

Pros and Cons

Querying, joining and aggregating data
In built-in and user-defined functions
Speed
Support for other big data frameworks like Spark

Need better user interfaces for browsing datastores and querying

Likelihood to Recommend

[Well suited for] Enterprises who want to create data warehouses on top of Hadoop ecosystem for reporting purpose or get summaries or aggregation from big data. In short, if you have implemented Hadoop then you need Hive.

February 17, 2018

One of the first SQL on Hadoop tools. Perhaps not the best.

Jordan Moore

Staff Consultant

Avalon Consulting, LLC (Information Technology and Services, 51-200 employees)

Score 7 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Hive allows us to run SQL queries against data sitting in Hadoop.

Pros and Cons

One of the standard SQL on Hadoop implementations. Comes installed in both HDP and CDH Hadoop distributions.
Hive Live Long and Process has made recent significant improvement on long-running queries.
Allows BI tools to run analysis over Hadoop data.
Allows various relational databases for its metastore. These include MySQL, Postgres, Derby, or Oracle.

Needs to keep up with execution engine improvements. Spark or Tez on Hive, then LLAP are good starts.
Overall speed of ad-hoc querying could be improved.

Likelihood to Recommend

Hive is well-suited for providing an SQL engine on Hadoop, but there are alternative SQL on Hadoop projects that claim to have improvements over Hive.

October 25, 2017

Bringing Structure to your Unstructured Data

Bharadwaj (Brad) Chivukula

Sr.Technical Manager/Delivery Manager

Nisum Technologies, Inc. (Retail, 10,001+ employees)

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

1. In Retail, the business partners are more comfortable querying their own data instead of relying on Engineers. Hive solves one of those problems. The main purpose of using Hive is to building reports and do analysis of data that is stored in the Hadoop file system.
2. Events are gathered in HDFS by flume and needs to be processed into parquet files for fast querying. The input data contains variable attributes in the json payload as each customer could define custom attributes.

Pros and Cons

Hive syntax is almost like SQL, so for someone already familiar with SQL it takes almost no effort to pick up Hive.
To be able to run map reduce jobs using json parsing and generate dynamic partitions in parquet file format.
Simplifies your experience with Hadoop especially for non-technical/coding partners.

Hive doesn't support many features that traditional RDBMS SQL has; so it may not be an easier transformation as one would presume.
Being OpenSource, it has its share of problems and lack of support; need to explore community groups to get some clarifications if you are not using any of the big distribution providers like Cloudera or HW.
Hive is comparatively slower than its competitors. It's easy to use but that comes with the cost of processing. If you are using it just for batch processing then Hive is well and fine.

Likelihood to Recommend

We are trying to mine data from massive data sets for a wide variety of purposes (debugging production issues, creating business metrics, models, and forecasts among other things). We have been able to do this very easily using our data warehouse and a combo of Hive and Pig. Makes it simpler for your BA's as they are familiar with SQL, and can adapt to Hive without too much of technical knowhow.

September 11, 2017

Apache Hive for ETL workloads

Verified User

Analyst in Engineering

Hospital & Health Care Company, 501-1000 employees

Score 5 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Apache Hive is being using across our organisation for analytical workloads. We use Hive along with Hortonworks distribution and it's a great SQL on Hadoop tool.

Pros and Cons

Hive is good for ETL workloads on Hadoop.
HiveQL translates SQL like queries into map reduce jobs.It supports custom map reduce scripts to plugged in.
Hive has two kinds of tables- Hive managed tables and external tables.
Use external table when other applications like pig, sqoop or mapareduce also using the file in hdfs. Once we delete the external table from Hive, it just deletes the metadata from Hive and original file in hdfs stays.

Use Hive for analytical work loads. Write once and read many scenarios. Do not prefer updates and deletes.
Behind scenes Hive creates map reduce jobs. Hive performance is slow compared to Apache Spark.
Map reduce writes the intermediate outputs to dial whereas Spark operates in in-memory and uses DAG.

Likelihood to Recommend

Use it for ETL workloads. I prefer repeat the same workload with Spark and decide the better performance

April 26, 2017

Apache Hive - Querying Big Data Made Easy!

Verified User

Engineer in Engineering

Computer Software Company, 51-200 employees

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We use Apache Hive for two main use cases, analyzing our ever growing data volume insights and reports, and as part of our ETL pipeline where we found writing in SQL like syntax to allow for more rapid development with low complexity to the overall system.

Apache Hive solves a few issues for us but the main one being the ability to analyze large volumes of data on S3 directly with overall strong performance. We have been able to analyze billions of records in a matter of minutes with relatively small EC2 cluster using Apache Hive. It also allows for our Data Analysts to simply write SQL and avoids the ramp up to use other tools such as Apache Pig.

Pros and Cons

Apache Hive allows use to write expressive solutions to complex problems thanks to its SQL-like syntax.
Relatively easy to set up and start using.
Very little ramp-up to start using the actual product, documentation is very thorough, there is an active community, and the code base is constantly being improved.

Debugging can be messy with ambiguous return codes and large jobs can fail without much explanation as to why.
Hive is only SQL-like, while more features are being added we have found that some things do not translate over (for example outer joins, inserts, columns can only be referenced once in a select, etc.).
For out ETL jobs it does not seem to be the optimal tool due to tunings and performance being difficult, Apache Pig may be better for heavy processing jobs.

Likelihood to Recommend

Apache Hive shines for ad-hoc analysis and plugging into BI tools. Its SQL-like syntax allows for ease of use not for only for engineers but also for data analysts. Through our experience, there are probably more desirable tools to use if you are planning on integrating Hive into your processing pipeline.

February 28, 2017

Hive Away, but not for everything!

Praveen Murugesan

Engineering Manager - Ride Experience

Uber (Internet, 5001-10,000 employees)

Score 6 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We use apache hive across the whole organization. We built our own in-house hadoop cluster for data warehousing purposes complementary to HP Vertica which we were using. Vertica is limited to scale, and to achieve true scalability and process trillions of records we had to invest in a new solution. Enter Apache Hive. We are very data driven as an organization and hence to satisfy to appetite of people and also stick to something familiar to query data (SQL) we decided to invest in Apache Hive as a starting point in our new data infrastructure.

Pros and Cons

Hive which leverages traditional MapReduce at the core, can be used to process a large amount of data without a problem. Any problem that can be solved with MapReduce can now be simply expressed in SQL.
Hive leverages the disk in the case of processing large data and is not limited by physical memory of any one machine (which is a limitation for systems like Presto). Hence it even allows reasonable fact-fact cross joins.
Hive is extensible with UDFs. For any common patterns you can quickly write your own function set and it can be leveraged by everyone.

Compute Speed - Hive will be my last option to query vs. something like Presto, which has a much smarter query engine. Hive is slow, and I'd use it only if we cannot use something like Presto/Impala.
SQL syntax of hive is unique and does not conform to ANSI SQL. This is quite painful for beginners.
The ability to upsert records would be nice to have. Hive is cumbersome for mutable data where partitions require them to be rewritten. No one has solved this really well. If this is solved - it could be leveraged by many systems.

Likelihood to Recommend

Process large datasets (especially joins of two large datasets, cross joins etc). Hive is not well suited for generic queries on one table and it can still be very slow. There are better solutions for that (Presto, Impala).

Return to navigation

Apache Hive

ClicData

retailMetrix

Apache Hive Hadoop Ecosystem - Big Data Analytics Tutorial by Mahesh Huddar

Connecting Microsoft Power BI to Apache Hive using Simba Hive ODBC driver

Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Presto

PostgreSQL

ClickHouse

Google BigQuery

MongoDB

Oracle Autonomous Data Warehouse

Oracle Exadata

SAP BW

SAP BW/4HANA

Cloudera Enterprise Data Hub

IBM Netezza Performance Server

OpenText Vertica

Cloudera Data Platform

1010data

Community Insights